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We have built an open-source software system for the modeling of biomolecular reaction networks, 
SloppyCell, which is written in Python and makes substantial use of third-party libraries for nu- 
merics, visualization, and parallel programming. We highlight here some of the powerful features 
that Python provides that enable SloppyCell to do dynamic code synthesis, symbolic manipulation, 
and parallel exploration of complex parameter spaces. 
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A central component of the emerging field of systems biology is the modeling and simulation of complex biomolecular 
networks, describing the dynamics of regulatory, signaling, metabolic, and developmental pathways in living organisms. 
(A small but representative example of such a network, describing signaling by G protein-coupled receptors, is shown 
in Fig. 1. Other networks under investigation can be found online[l|.) Tools for the inference of networks from 
experimental data, simulation of network behavior, estimation of model parameters, and quantification of model 
uncertainties are all necessary to this endeavor. Our research into complex biomolecular networks has revealed an 
intriguing property, namely their sloppiness: these networks are vastly more sensitive to changes along some directions 
in parameter space than along others 0, [H, U, While many tools for the simulation of biomolecular networks have 
been built, none support the types of analyses that we need to unravel this phenomenon of sloppiness. Therefore, we 
and our colleagues have implemented our own software system - SloppyCell - to support our research^. 
• i-h \ SloppyCell is an open-source software system written in Python, which provides support for model construction, 
• simulation, fitting, and validation. One important role of Python is to glue together many diverse modules providing 
q-i specific functionality. We use numpyQ and scipy[l] for numerics - particularly for integrating differential equations, 
l — 1 . optimizing parameters by least-squares fits to data, and analyzing the Hessian matrix about a best-fit set of parameters. 
We use matplotlib (a.k.a. pylab) for plotting[9(. A Python interface to the libSBML library [lfj allows us to read 
and write models in a standardized, XML-based file format, the Systems Biology Markup Language (SBML)[ll[. 
We use the pypar interface [l2j to the Message Passing Interface (MPI) library to coordinate parallel programs on 
[ distributed memory clusters. We generate descriptions of reaction networks in the dot graph specification language 
for visualization via Graphviz [l3j . We use the htmllib module from the Python standard library to generate web 
pages describing models of interest, and the smtplib module to have simulation runs send email with information on 
t^J- i their status (for those dedicated researchers who cannot bear to be apart from their work for too long). 

While Python serves admirably as a glue layer for a variety of different tools, as well as a high-level language for 
specifying and interrogating simulations, we highlight here in this paper a few powerful features of Python that enable 
us to construct highly dynamic and flexible simulation tools. 
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Code synthesis and symbolic manipulation in SloppyCell 

The dynamics of reaction networks are treated typically as either continuous and deterministic (modeling the time 
evolution of molecular concentrations) or as discrete and stochastic (by simulating many individual chemical reactions 
via Monte Carlo). In the former case, systems of ordinary differential equations (ODEs) are constructed from the 
underlying network topology and the kinetic forms of the associated chemical reactions. In practice, these ODEs are 
often derived by hand, but they need not be: all the information required for their synthesis is embedded within a 
suitably defined network, but the structure of any particular model is known only at runtime once an instance of a 
Network class has been created and specified. 

With Python, we use symbolic expressions (encoded as strings) to specify the kinetics of different reaction types, and 
loop over all the reactions defined in a given network to construct a symbolic expression for the ODEs describing the 
time evolution of all chemical species. (The reactions are depicted as arrows in Fig. 1; each reaction can be queried to 
identify those chemicals involved in that reaction (represented as shapes) , as well as an expression for the instantaneous 
rate of the reaction based on model parameters and the relevant chemical concentrations.) This symbolic expression 
is formatted in such a way that we can define a new method, get_ddv_dt(y, t), that is dynamically attached to an 
instance of the Network class using the Python exec statement. (The term "dv" in the method name is shorthand 
for "dynamical variables", i.e., those chemical species whose time evolution we are solving for.) This dynamically 
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FIG. 1: Model for receptor-driven activation of heterotrimeric G proteins The a signaling protein is inactive when bound 
to GDP and active when bound to GTP. After forming a complex with a (Hy protein, binding to the active receptor R allows 
the a protein to release its GDP and bind GTP. The complex then dissociates into back R, f3-y, and activated a. The activated 
a protein goes on to signal down-stream targets, while the f3y protein is free to bring new inactive a to the receptor. 

generated method is then used in conjunction with ODE integrators (such as scipy . integrate . odeint, which is a 
wrapper around the venerable LSODA integrator [lH Il6j . or with the variant LSODARpj} that we have wrapped up 
within SloppyCell in order to integrate ODEs with defined events). We refer to this process of generating the set of 
ODEs for a model directly from the network topology as "compiling" the network. 

This sort of technique allows us to do more than just synthesize ODEs for the model itself. Similar techniques 
allow us to construct sensitivity equations for a given model, in order to understand how model trajectories vary with 
model parameters. In order to accomplish this, we have developed a small package that supports the differentiation 
of symbolic Python math expressions with respect to specified variables, by converting mathematical expressions 
to Abstract Syntax Trees using the Python compiler module. This allows us to generate another new method, 
get_d2dv_dovdt (y , t), that describes the derivatives of the dynamical variables with respect to both time and the 
"optimizable variables" (e.g., model parameters whose values we are interested in fitting to data). By computing 
parametric derivatives analytically rather than via finite differences, we can better navigate the ill-conditioned terrain 
of the sloppy models of interest to us. 

The Abstract Syntax Trees (ASTs) created to represent the detailed mathematical form of biological networks have 
other benefits as well. We also use the ASTs to generate MpjX representations of the relevant systems of equations. In 
practice, this not only saves us alot of error-prone typing, but it is a very useful tool in debugging the implementation 
of a particular model. 

Colleagues of ours who are developing PyDSTool - a Python-based package for the simulation and analysis of 
dynamical systems[lH - have taken this type of approach to code synthesis for differential equations a step further. 
The approach described above involves generating Python-encoded right-hand-sides to differential equations, which 
are used in conjunction with compiled and wrapped integrators. For additional performance, PyDSTool supports 
the generation of C-encoded right-hand-sides, which are then dynamically compiled and linked using the Python 
distutils module and used with various ODE integrators. 

Parallel programming in SloppyCell 

Because of the sloppy structure of complex biomolccular networks, it is important not just to simulate a model for 
one set of parameters, but over large families of parameter sets consistent with available experimental data. We use 
Monte Carlo sampling to simulate a model with many different parameter sets in order to estimate model uncertainties 
(error bars) associated with predictions. Parallel computing on distributed memory clusters efficiently enables these 
sorts of extensive parameter explorations. There are several Python different packages that provide interfaces to MPI 
libraries for message passing, and we have found pypar[I3| to be especially useful in this regard. 
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Whereas message passing on distributed memory machines is inherently somewhat cumbersome and low-level, pypar 
raises the bar by exploiting built-in Python support for the pickling of complex objects. Message-passing in a low-level 
programming language such as Fortran or C typically requires the construction of appropriately sized memory buffers 
into which one must pack complex data structures, but pypar uses Python's ability to serialize (pickle) an arbitrary 
object into a Python string, which can then be passed from one processor to another and unpicklcd on the other 
side. With this we can pass lists of parameters, model trajectories returned by integrators, etc.; we also send Python 
exception objects raised by worker nodes back to the master node for further processing. (These may be generated, 
for example, if the ODE integrator fails to converge for a particular set of parameters.) 

Additionally, Python's built-in eval statement makes it easy to create a very flexible worker that can execute 
arbitrary expressions passed as strings by the master (requiring only that the inputs and return value are picklable). 
The following snippet of code demonstrates a basic error-tolerant master/worker parallel computing environment for 
arbitrarily complex functions and arguments defined in some hypothetical module named our_science: 

import pypar 

# our_science contains the functions we want to execute 
import our_science 

if pypar. rankO != 0: 

# The workers execute this loop. (The master has rank ==0.) 
while True : 

# Wait for a message from the master, 
message = pypar .receive (source=0) 

# Exit python if sent a SystemExit exception 
if isinstance (message, SystemExit): 

sys .exitO 

# Evaluate the message and send the result back to the master. 

# If an exception was raised send that instead, 
command, msg_locals = message 

localsO . update (msg_locals) 
try: 

result = eval (command) 
except X: 

result = X 
pypar . send(result , 0) 

# The code below is only run by the master 

# Evaluate our_science . f oo(bar) on each worker, getting values for 

# bar from our_science .bar_todo . 

command = 'our_science.foo(bar) ' 

for worker in ranged, pypar . size ()) : 

args = {'bar': our_science .bar_todo [worker] } 

pypar . send ( (command, args), worker) 

# Collect results from all workers 

results = [pypar . receive (worker) for worker in ranged, pypar . sized )] 

# Check if any of the workers failed, if so, raise the resulting Exception, 
for r in results: 

if isinstance (r , Exception): 
raise r 

# Shut down all the workers . 

for worker in ranged, pypar . size ()) : 
pypar. send (SystemExit () , worker) 



4 



Summary 

We have very briefly described a few of the fun and flexible features that Python provides to support the construction 
of expressive computational problem solving environments, such as those needed to tackle complex problems in systems 
biology. While any programming language can be coaxed into doing what is desired with sufficient hard work, Python 
encourages researchers to ask questions that they might not have even thought of were they working in less expressive 
environments. 
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